The Improved BiCG Method for Large and Sparse Linear Systems on Parallel Distributed Memory Architectures
نویسندگان
چکیده
For the solutions of large and sparse linear systems of equations with unsymmetric coeÆcient matrices, we propose an improved version of the BiConjugate Gradient method (IBiCG) method based on [5, 6] by using the Lanczos process as a major component combining elements of numerical stability and parallel algorithm design. For Lanczos process, stability is obtained by a coupled two-term procedure that generates Lanczos vectors scaled to unit length. The algorithm is derived such that all inner products, matrix-vector multiplications and vector updates of a single iteration step are independent and communication time required for inner product can be overlapped eÆciently with computation time of vector updates. Therefore, the cost of global communication on parallel distributed memory computers can be signi cantly reduced. The resulting IBiCG algorithm maintains the favorable properties of the Lanczos process while not increasing computational costs. Data distribution suitable for both irregularly and regularly structured matrices based on the analysis of the non-zero matrix elements is presented. Communication scheme is supported by overlapping execution of computation and communication to reduce waiting times. The eÆciency of this method is demonstrated by numerical experimental results carried out on a massively parallel distributed memory system.
منابع مشابه
The Non-Symmetric s-Step Lanczos Algorithm: Derivation Of Efficient Recurrences And Synchronization-Reducing Variants Of BiCG And QMR
The Lanczos algorithm is among the most frequently used iterative techniques for computing a few dominant eigenvalues of a large sparse non-symmetric matrix. At the same time, it serves as a building block within biconjugate gradient (BiCG) and quasi-minimal residual (QMR) methods for solving large sparse non-symmetric systems of linear equations. It is well known that, when implemented on dist...
متن کاملComputation of Dendrites on Parallel Distributed Memory Architectures
A code for simulating the solidi cation of a pure material from its undercooled melt based on a phase eld approach has been written for parallel distributed memory architectures using MPI. The numerical scheme is based on nite di erences and results in large sparse non-linear systems which are solved by a backtracking line search modi cation of Newton's method combined with GMRES. Experiments c...
متن کاملAn improved parallel block Lanczos algorithm over GF(2) for integer factorization
RSA is one of the most popular algorithms for public-key cryptosystems. The security of this algorithm relies on the difficulty of factoring large integers. GNFS is the most efficient algorithm for factoring large integers over 110 digits, and solving the large sparse linear system over GF(2) is one of the most time-consuming steps in the GNFS. In the thesis proposal, an improved and more effic...
متن کاملParallel IQMR Method for Unsymmetric Large and Sparse Linear Systems in Computational Fluid Dynamics
| We mainly examine the application of the improved version of the quasi-minimal residual (IQMR) method 20], 21] for the solutions of linear systems of equations with unsymmetric coeecient matrices arising from the discretization of uid dynamic problems on massively parallel distributed memory computers. We will deal with implicit nite diierence schemes for solving the Euler equations. These sc...
متن کاملApplication Interface to Parallel Dense Matrix Libraries: Just let me solve my problem!
We focus on how applications that lead to large dense linear systems naturally build matrices. This allows us explain why traditional interfaces to dense linear algebra libraries for distributed memory architectures, which evolved from sequential linear algebra libraries, inherently do not support applications well. We review the application interface that has been supported by the Parallel Lin...
متن کامل